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ABSTRACT 

We consider the problem of determining the Galaxy's gravitational potential from 
a star catalogue. We show that orbit-based approaches to this problem suffer from 
unacceptable numerical noise deriving from the use of only a finite number of orbits. 
An alternative approach, which requires an ability to determine the model's phase- 
space density at predetermined positions and velocities, has a level of numerical noise 
that lies well below the intrinsic uncertainty associated with the finite size of the 
catalogue analysed. A catalogue of 10 000 stars brighter than V = 17 and distributed 
over the sky at b > 30 degrees enables us to determine the scaleheight of the disc that 
contributes to the potential with an uncertainty below 20 pc if the catalogue gives 
proper motions, line-of-sight velocities and parallaxes with errors typical of the Gaia 
Catalogue, rising to 36 pc if only proper motions are available. The uncertainty in the 
disc's scalelength is significantly smaller than 0.25 kpc. 

Key words: Galaxy: kinematics and dynamics - Galaxy: structure - methods: data 
analysis 



1 INTRODUCTION 

Very large resources are currently being devoted to surveys 
of our Galaxy, both from the ground (e.g. APOGEE, RAVE 
and Gaia-ESO: Eisenstein et al. 2011; Steinmetz et al. 2006; 
Gilmore et al. 2012) and from space (e.g. Gaia: Perryman 
et al. 2001). These surveys are being undertaken in the ex- 
pectation that they will reveal the current kinematic and 
chemical structure of our Galaxy and how our Galaxy was 
assembled. The latter is a pivotal question for cosmology 
since our Galaxy is typical of the galaxies that dominate 
star formation in the contemporary Universe, and the con- 
sensus ACDM cosmology has given us a moderately clear 
view of how such galaxies formed. 

Since stars and dark matter orbit freely in the Galaxy's 
gravitational potential, our understanding of the Galaxy can 
never be better than our knowledge of its gravitational po- 
tential c l > (x). Since even at the Sun we are unable to mea- 
sure the density of dark matter, $ cannot be determined 
from Poisson's equation, but must be constrained by mea- 
suring the motions of objects that we can see - in practice 
stars and interstellar gas. Historically, the crucial data have 
been the circular speed in the plane, v c (R), estimated from 
the line-of-sight velocities of interstellar gas (Malhotra 
1994a), the local density p(Ro) estimated from the kinemat- 
ics of nearby stars (Creze et al. 1998; Holmberg & Flynn 
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2004), the local surface density Si.i estimated from obser- 
vations of stars a few hundred parsecs away from the plane 
(Kuijken & Gilmore 1989), and the proper motion of Sagit- 
tarius A* at the Galactic centre (Reid & Brunthaler 2004; 
Gillessen et al. 2009). One can find models of the Galac- 
tic potential which fit all of these constraints (e.g. Dehnen 
& Binney 1998; Klypin, Zhao, & Somerville 2002; McMillan 
2011), but unfortunately they do not constrain the density of 
dark matter very strongly. The data that are now available 
to us, or will shortly become available, open new horizons 
in the determination of $. 

The questions we address are: 

• given a star catalogue, what is the best way to constrain 
$? 

• When we proceed in the optimal way, how strongly do 
data of the type that will shortly be available constrain $? 

The data available from surveys of the Galaxy funda- 
mentally differs from that generally available for external 
galaxies. It is practical to determine not just the position on 
the sky and line-of-sight velocity of a star, but also proper 
motions and distance from the Sun with a useful degree of 
accuracy. However, there is a strong bias in which stars are 
observed by the survey - those sufficiently close to the Sun's 
position in the Galaxy and which lie within the selection 
function of the survey. For observations of external galaxies 
there is no such bias created by the Sun's position, and full 
surface brightness profiles can be found, but the velocity in- 



© 0000 RAS 



2 P. J. McMillan and J. J. Binney 



formation is generally limited to binned line-of-sight velocity 
distributions. Ensuring that models can accurately represent 
data of the precision available for stars in the Galaxy is a 
significant problem. 

A compelling argument can be made that the availabil- 
ity of models of sufficient sophistication is the key to ex- 
tracting science from a star catalogue (McMillan & Binney 
2012, henceforth Paper I). One must discriminate between 
various model Galaxies by inferring their a posteriori prob- 
abilities, given the catalogue. The simplest possible dynam- 
ical model Galaxy is defined by the pair of functions (/, $), 
where /(x,v) is the distribution function (df). 

When estimating $ an unavoidable assumption is that 
the Galaxy is in a steady state: without this assumption 
the observed kinematics of any tracer objects are consistent 
with any potential; it is only by assuming that $ is deep 
enough to prevent a tracer population expanding, and not so 
deep as to cause contraction in the next dynamical time that 
we can constrain The assumption that our tracers are in 
dynamical equilibrium permits us to invoke the strong Jeans 
theorem and conclude that the df / of our tracer population 
is a function f{h,l2, • • •) of whatever isolating integrals $ 
admits. In the 1970s numerical experiments revealed that 
typical galaxy potentials admit three independent isolating 
integrals, for example, energy E = ^v 2 + the component 
of angular momentum L z about the potential's symmetry 
axis and a "third integral" I3, which controls the division 
of energy in excess of the energy E C (L Z ) of a circular orbit 
of angular momentum L z between oscillation in radius and 
oscillation perpendicular to the potential's equatorial plane. 

Any function of isolating integrals is itself an isolating 
integral, so there is considerable flexibility in the choice of 
arguments of the df. We have argued elsewhere (McMillan & 
Binney 2008; Binney 2010) that there are compelling reasons 
to choose the actions J r , = L z and J z as our isolating 
integrals. So we work with these integrals here and take a 
model galaxy to be the pair of functions (/(J), <fr(x)). How- 
ever, working with alternative integrals would not change 
our conclusions in any essential way; alternative integrals 
would simply make the computations harder and less trans- 
parent. In particular, our arguments apply to models of the 
type that are most widely used in studies of external galax- 
ies: Schwarzschild (1979) models. These models comprise a 
potential $ and an orbit library, each element of which is 
the time series of phase-space coordinates (x(t),v(£)) ob- 
tained by integrating the equations of motion in $(x) for a 
particular initial condition, and a weight w ^ with which 
that time series is employed in the model. In effect these 
models use the initial conditions of integrations as isolating 
integrals, and the weights w are surrogates for the value of 
the df on the given initial conditions. 

In general terms the procedure for modelling a cata- 
logue is to determine the probability of the data given some 
pair (/, <3>) and then to use Bayes' theorem to convert the 
probability of the data into the probability of the pair. Ide- 
ally, the probabilities of every plausible pair (/, $) would 
be determined, but in practice one has to be content with a 
search for a limited number of more likely pairs. The problem 
is made computationally tractable by considering each can- 
didate potential in turn, and then finding the most probable 
companion df. In Paper I we showed that when a catalogue 
of 10 000 stars is constructed in a known potential from a df 



of a given functional form, the df can be recovered to good 
precision from the catalogue. In this paper we explore our 
ability to determine the correct potential by repeating the 
DF-fitting step for a series of potentials and identifying the 
potential that yields the largest likelihood for the catalogue. 

We show that this problem cannot be efficiently solved 
by any orbit-based technique such as N-body modelling, 
Schwarzschild modelling, or torus modelling of the type used 
in Paper I. We show further that the problem can be solved 
if we have available expressions for the isolating integrals as 
functions of the conventional phase-space variables, for ex- 
ample J(x,v). This second approach was adopted by Ting 
et al. (2012), and we extend it to include both observational 
errors and realistic selection effects, and to exploit the more 
powerful approach to the determination of actions of Binney 
(2012a). 

The outline of the paper is as follows. In Section 2 we 
describe the catalogues that we consider, the models that 
we compare them to, and the tools used to do the compar- 
ison. In Section 3 we explain the two competing methods 
we use to analyse the data. In Section 4 we show that orbit- 
based methods are ill-suited to this analysis. In Section 5 wc 
demonstrate methods using expressions for J(x,v) that are 
capable of successfully determining the true potential. 



2 THEORETICAL FRAMEWORK 

Three actions J; and three conjugate angle coordinates 9i 
provide exceptionally convenient coordinates for objects or- 
biting in a stationary gravitational potential The actions 
are conserved quantities and the angles increase linearly 
with time, 0i(t) = 0i(0) + ili(J)t, where Qi is a frequency. 

Thus J labels an orbit and 8 specifies a point on that 
orbit. The usual phase space coordinates x, v are 27r-periodic 
in each angle coordinate Angle-action coordinates have 
the convenient property that 



so it is simple to relate a density in angle-action space to 
the phase-space density /(x, v). 

The relationship (6, J) (x, v) depends upon the grav- 
itational potential <E>. Unfortunately analytical expressions 
for this relationship are only known for a very limited set 
of potentials. In recent years a great deal of effort has gone 
into developing numerical approximations to this relation- 
ship (McMillan & Binney 2008; Binney 2010; Binney & 
McMillan 2011; Sanders 2012a; Binney 2012a). In this paper 
we first use the machinery described in McMillan & Binney 
(2008) that yields (x(0, J), v(#, J)) and then the machinery 
described in Binney (2012a), which gives the inverse trans- 
formation (<?(x, v), J(x, v)). 

2.1 Torus modelling 

Torus modelling (McMillan & Binney 2008, and references 
therein) is a method which, for a single value J in a given 
potential provides (through a numerical minimisation) an 
expression for the phase-space coordinates (x, v) in terms of 
6. Thus it tells us the complete phase-space structure of the 
specified orbit J. We refer to this model of an orbit as a 
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"torus" because the three-dimensional surface mapped out 
in phase space as the 0, vary over their full range (0, 2tt) is 
isomorphic to a 3-torus. 

Thus with this method it is easy to find (x, v) in terms 
of (9, J), but far harder to find (8, J) given (x,v) (though 
it can be done iteratively, e.g. McMillan & Binney 2008). 

Torus modelling is best understood as an extension of 
Schwarzschild modelling (Schwarzschild 1979) in which time 
series are replaced by orbital tori. This replacement brings 
a number of advantages (Binney & McMillan 2011). Torus 
modelling was the method used to in Paper I, and has also 
been used in modelling the Hyades moving group (McMillan 
2013) and to show how one can disentangle the history of 
a disrupted satellite object that we observe as debris in the 
Solar neighbourhood (McMillan & Binney 2008). 



2.2 Stackel approximation 

Binney (2012a) introduced an algorithm for calculating the 
actions of stars with known phase-space coordinates in ax- 
isymmetric potentials. It is based upon the approximation 
that in the region probed by a given orbit, the potential of 
interest does not differ greatly from a Stackel potential (e.g. 
de Zeeuw 1985; Binney & Tremaine 2008, §3.5.3). With this 
assumption it is possible to estimate the radial and vertical 
actions J r and J z at any point (x, v) from one-dimensional 
integrals over coordinates that form a system of confocal el- 
lipsoidal coordinates. Thus when this apparatus is used, it 
is easy to obtain (0, J) from (x, v) but hard to proceed in 
the opposite direction. 

This approximation gives values of J for typical orbits 
in the thin disc which are around a factor of 4 more accurate 
than those found by the "adiabatic approximation" which 
has been used for the same purpose (e.g. Binney 2010; Ting 
et al. 2012). It provides an even greater improvement in 
accuracy for the orbits of many thick-disc stars, to which the 
adiabatic approximation does not apply. It has been used to 
fit dfs to observational data for the Solar neighbourhood, 
given assumed gravitational potentials (Binney 2012b). 



2.3 Distribution functions 

The distribution functions used in this paper are all based 
upon the "quasi-isothermal" df (Binney & McMillan 2011). 
Here we modify the notation used previously in two respects: 
(i) we change the normalisation of / so when integrated over 
all phase space it produces unity, and (ii) we replace the 
parameter q by R a = Rd/q, where J? CT is the radial scale on 
which the velocity dispersions decline with increasing radius, 
and _R d is the conventional scalelength of the approximately 
isothermal disc. 



/(J) 



cut(L z 



(2) 



with J = (J r , J Z ,L Z ). Here £l(L z ) is the circular frequency 
for angular momentum L z , n(L z ) is the radial epicycle 
frequency and v(L z ) is its vertical counterpart. S(L Z ) = 
£oe~ flc / fld is the (approximate) radial surface-density pro- 
file, where R C (L Z ) is the radius of the circular orbit with 



angular momentum L z , and M = 27rEo-Rd is a constant 
included to ensure that 



/ 



d 3 J/(J) = l. 



(3) 



The factor cut(L z ) is included to ensure that we do not have 
equal numbers of stars rotating in each direction. We use 



cut(L z ) = i [1 +tanh(L*/L )] 



(4) 



where the value of Lo is unimportant in this study pro- 
vided it is small compared to the angular momentum of the 
Sun. We hold it fixed at Lo = lOkpckms" 1 . The functions 
a z (L z ) and a r (L z ) control the vertical and radial velocity 
dispersions. We adopt 

a r (L z ) = a r0 e (fl »- Bc>/fl " 

a z {L K ) = a z0 e( R °- H ° )/R °, (5) 

where ovo and a z o are parameters that are set to values 
close to the radial and vertical velocity dispersions at the 
Sun. The observed insensitivity to radius of the scaleheights 
of extragalactic discs suggests R a ~ 2i?d, if Ra is the scale- 
length of the disc that dominates the potential. To simplify 
calculations we hold R a — i?d/0.45 (as in Paper I). 

Binney (2012b) showed that by superposing a large 
number of quasi-isothermal dfs, one can obtain a model 
that is consistent with the local stellar density and velocity 
distribution as revealed by the Geneva-Copenhagen survey 
(Holmberg et al. 2009). In this study we, as in Paper I, 
restrict ourselves to simple two-disc models in order to pro- 
vide some straightforward demonstrations of the principles 
involved. These are of the form 



/(J) = (l-A)/thin(J)+A/ t hick(J) 



(6) 



with /thin and /thick of the form given in equation 2, and A 
the fraction in the thick disc. Extending this work to more 
complicated dfs is, in principle, straightforward. 

2.4 Numerical details of models used in this study 

The models we use to test our analysis techniques are dis- 
crete realisations obtained by sampling a df that has two 
quasi-isothermal discs with parameters as listed in Table 2. 
We refer to this df as /true- This is identical to one of the dfs 
used in Paper I and it is sampled using the torus machinery 
as described in Paper I. 

In all cases the model is constructed in the "convenient" 
Milky Way potential given by McMillan (2011), which we 
will refer to as the "Mcll" potential. This is an axisymmet- 
ric model, in which the potential is assumed to be produced 
by a Galactic bulge, thin and thick exponential discs, and a 
spherical Navarro, Frenk, & White (1996) halo. The density 
of the bulge is 

pb= (l/rVro)° eXP [" ( r >cu t ) 2 ], « 
where, in cylindrical coordinates, 

r' = ^WTWoY (8) 

with q = 1.8, ro = 0.075kpc, r cu t = 2.1kpc, and axis ratio 
q = 0.5; the densities of the two discs are of the form 



Pd(R,z) = exp 



.M 

2d 



R_ 

Rd 



(9) 
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Table 2. Parameters of the DF used to construct our catalogues, 
/true- The fraction in the thick disc component is A = 0.23. 



Disc 




oyo 






(kpc) 


(kms" 1 ) 


(kms" 1 ) 


Thin 


3.0 


27 


20 


Thick 


3.5 


48 


44 



with scaleheight z d , scalelength R d and central surface den- 
sity E d ,o; and the density of the halo is of the form 



Ph = 



Ph,0 



x (1 + x) 2 



(10) 



where x = r/rh, with rh the scale-radius. The parameters of 
the Mcll model are shown in Table 1. 

We also consider a range of other potentials, to show 
how well our analysis techniques work when asked to com- 
pare the likelihoods of competing plausible potentials. We 
perform a number of tests in which all parameters are held 
constant at those of Mcll, except the disc scaleheights - 
in these cases the ratio of the two disc scaleheights is held 
constant, and the quoted value is that of the thin-disc scale- 
height. 

We also perform tests in which the disc scalelengths are 
varied. Again we hold the ratio of the two disc scalelengths 
constant. However, now leaving the other parameters of the 
potential unchanged would lead to substantial changes in the 
properties of the model (for example in the circular speed), 
so we constrain the parameters of each potential in the same 
way as in McMillan (2011), with the Sun's position and the 
disc scaleheights being held constant. 

The constraints on the potentials employed include 
the proper motion of Sgr A* (Reid & Brunthaler 2004). 
Given the Sun's Galactocentric distance (8.5 kpc) and pe- 
culiar motion with respect to the local standard of rest, 
(Schonrich, Binney, & Dehnen 2010) the proper motion of 
Sgr A* strongly constrains the local circular speed. We also 
fit the potentials to the terminal velocity of the ISM at 
30° < I < 90° (Malhotra 1994b, 1995) and to observed 
maser sources (e.g. McMillan & Binney 2010) which con- 
strain the shape of the circular-speed curve, and to the ver- 
tical force 1.1 kpc from the plane at the Solar radius (Kuijken 
& Gilmore 1991). For full details of the constraints applied, 
see McMillan (2011). 

Constructing each potential in this way ensures that we 
are comparing potentials that are all good fits to existing 
basic kinematic data, which are currently state-of-the-art 
constraints on the Galactic potential. The results of this 
analysis therefore show how these dynamical models can in- 
crease our knowledge of the potential. 

The parameters of these potentials are given in the ap- 
pendix, Table 4, and we refer to them in the text by their 
thin disc scalelengths. 

We consider catalogues of "observations" of the discrete 
realisations. In general, a catalogue of N stars gives accurate 
values of the Galactic coordinates (b,l), values of apparent 
magnitude m, colour V — I, and line-of-sight velocity v\\ 
that have moderate errors, and values of the parallax vj, 
proper motion fj,, surface gravity g and metallicity Z that 




Figure 1. Luminosity function given by equation (13), and used 
in all tests. 



are probably significantly in error. We group the variables 
into two sets, the basic variables 



u = (b, I, m, zu, fx, v\\) 

and additional astrophysical variables 

B =(V-I,g,Z). 



(11) 



(12) 



Note that u has seven components, effectively a star's phase- 
space coordinates (x, v) and its apparent magnitude m. For 
now we neglect interstellar extinction. Then a star's abso- 
lute magnitude M is effectively specified by u because its 
distance is fixed by x. 

As in Paper I, we restrict ourselves to the case of a 
single stellar population. This assumption ensures that there 
are no correlations between stellar type and kinematics: the 
distribution of stars in phase space is independent of their 
luminosities, colours, metallicities, etc. In this case we can 
confine discussion to the components of u and neglect s. We 
further assume that the luminosity function F(M) is known 
to be 

(-14.9 + 21A/-5.4M 2 
+0.59 M 3 - 0.019 M 4 for 1 < M < 19 (13) 
otherwise, 

which is a simple polynomial approximation to the general 
V-band luminosity function described in Binney & Merri- 
field (1998), Table 3.16. This function is plotted in Figure 1 
and satisfies the normalisation condition 



r 

j — < 



dMF(M). 



(14) 



Throughout this paper we use the notation F(m, s) to mean 
the luminosity function of an apparent magnitude m at a 
heliocentric distance s where, as we neglect extinction, 1 



F(m,s) = F(m - 51og(s/10pc)). 



(15) 



We consider three catalogues of "observations" that give 
us information on the variables u° for each star a. In each 
case the catalogue contains N a — 10 000 stars, all at Galac- 
tocentric latitude b > 30° and with apparent magnitude 
m < 17. We assume that no other selection effects bias the 



Here and throughout this paper we use log to mean log 10 . 
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Sd,0,thin 


^d.thin 


2d, thin 


Sd,0, thick 


Rd, thick 


2d, thick Pb,0 


Ph,0 


rh 


753.0 M Q pc~ 2 


3kpc 


0.3 kpc 


182.OM pc- 2 


3.5 kpc 


0.9kpc 94.1 Mqpc" 3 


0.0125 M pc" 3 


17 kpc 



Table 1. Parameters of the Mcll model used in the construction of the particle model. Note that the potentials with differing scalehcights 
used in this study have identical parameters except the two disc scaleheights (which are held with Zd,thick/2d,thin = 3, with the quoted 
scalchcight always being that of the thin disc). The potentials with varying Rd have different parameters, listed in Table 4. 
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Figure 2. Histograms of the number of stars in the catalogue as 
a function of Galactocentric radius (left) and distance from the 
Galactic plane (right). 



3 ASSESSING A MODEL LIKELIHOOD: TWO 
APPROACHES 

3.1 General notation 

Before describing our new method of solving the problem 
of determining model likelihoods, we explain our notation 
(which is very similar to that used in Paper I). We assume 
that the errors in the observed quantities are independent 
and can be modelled by Gaussian probability distributions 

G(u, u, a) = ^=L= e - ( "-" )2/2,T2 . (16) 



We attach primes to the true values of measured quantities 
to distinguish them from the measured values. For brevity 
we use the notation 



(17) 



catalogue. For each observed star we assume that the quoted 
Galactic coordinates (b, I) are exact, as is the apparent mag- 
nitude m (this is equivalent to the statement that the uncer- 
tainty in m is much smaller than the scale on which F(M) 
varies). We then have: 

• A catalogue with measurements of parallax with un- 
certainty a m — 0.2 mas, measurements of line-of-sight ve- 
locity with uncertainty cry = 5kms _1 , and measurements 
of proper motion with uncertainty (in each direction) of 
ct m = 0.2 masyr -1 . 

• A catalogue with parallax and proper motion measure- 
ments with the same uncertainty as previously, but with no 
measurement of the line-of-sight velocity. 

• A catalogue with proper motion measurements with the 
same uncertainty as previously, but with no line-of-sight ve- 
locity or parallax measurements. 

To convert values from Galactocentric coordinates to 
Heliocentric coordinates u (to create this catalogue), or vice 
versa (to analyse the catalogue) we need to assume a posi- 
tion and velocity for the Sun. In all cases we assume that 
the Sun is at a Galactocentric radius Ro = 8.5 kpc moving 
with the peculiar velocity found by Schonrich, Binney, & 
Dehnen (2010) with respect to the local standard of rest in 
the currently hypothesised potential. 

In Figure 2 we show histograms of the number of stars 
in the catalogue as a function of Galactocentric R and z. 
We use the true positions of the stars to produce these his- 
tograms, rather than adding any uncertainties. 



We find the likelihood C of a model as a product over 
stars of the probabilities of measuring the values u° given 
the model: 

C = Y[C" = Y[P{^ a \MoAe\) 

a a 

= n / d7u ' g i( u "' u '> <t ") x p ( u 'i M ° de i)- (!8) 

Any quantity that is not given in the catalogue (such as 
v\\ in two of our catalogues) can be considered to have a 
sufficiently large a that the Gaussian density is effectively 
constant for all relevant values of the variable. 

The probability P(u'|Model) d 7 u' is the probability 
that a randomly chosen star in the catalogue has the true 
values u', for a particular model. Specifically P(u') is given 
by 

P(u'|Model) = AS(u')F(M)/(x,v) , (19) 

where the selection function S(u') is the probability that if 
a star with observables u' exists it will be included in the 
catalogue, and the normalisation factor A depends on the 
df, the luminosity function and the survey selection effects 
via the equation 

1/A = J d 3 xd 3 v/(x,v) j dMS(x,v,M)F(M). (20) 

As we assume that the df /(x, v) is that of an equilibrium 
dynamical model, we have 

J d 3 x d 3 v /(x, v) = (2tt) 3 J d 3 J /(J) = 1, (21) 

where the integrals are over all phase space and action space, 
respectively. 

Our catalogue contains stars at Galactic latitudes b > 
friim and apparent magnitude m < mii m with 6n m = 30° and 
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mii m = 17. We assume that, other than these limits, the 
probability of a star entering the catalogue is independent 
of its properties, i.e. 



S(u') = 



const for b > bn m , m < mii m 
otherwise. 



(22) 



Note that as this constant then appears in both the numer- 
ator and denominator (implicitly) of eq. (19), we can ignore 
it. 

Since the potential $ determines the relationship be- 
tween /(x, v) and /(J), the normalisation factor A (eq. 20) 
is a function of <E> in addition to /(J), so we might write 
P(u'!Model) = P(u'\f, $, F, S). 

Formally, the best way to constrain the potential is to 
marginalise over all possible dfs (Magorrian 2006, 2013). 
While our use of dfs that are functions of J opens up this 
exciting possibility, we do not attempt it in this study. Ting 
et al. (2012) marginalised over the parameters of a sin- 
gle pseudo-isothermal df. This relatively low-dimensional 
marginalisation still required at least 1 to 2 orders of mag- 
nitude longer than simply finding the maximum likelihood 
for a df of the assumed form in a given potential. They found 
that this marginalisation made little difference to their re- 
sults (Ting, priv. comm.). In this study we only consider 
the maximum likelihood for a df of the assumed form in 
the chosen <£>. This is very much like the approach used by 
Schwarzschild modellers. 



3.2 Evaluating with x(0, J) 

We first consider the approach to likelihood evaluation that 
was successfully employed in Paper I. This is founded on a 
library of tori, each of which yields (x(0, J), v(0, J)). The 
problems we encounter would be at least as serious if we 
were to replace the torus library by an orbit library of the 
type introduced by Schwarzschild (1979). Our description of 
the use of tori will be brief; a reader looking for more detail 
should read Sections 4 & 5 of Paper I. 

A torus provides complete knowledge of the orbit with 
actions J in a given potential . We therefore convert the 
integrals in equations (18) & (20) into integrals over (9, J). 
The normalisation factor A (eq 20) is rewritten as 



1/A= / /(JMJ)dJ, 



where 



0(J) 



dmF(m, s) S(x, v, m). 



(23) 



(24) 



Note that </>(J) depends on the potential. Regions of action 
space with 4>(J) — (in a given potential) do not contribute 
to the calculation and can be ignored - though the analytic 
forms we use for the df do make predictions of the df outside 
the survey volume, which can then be tested by further data. 

As in Paper I we use the approximations that the mea- 
surements of b, I and m are exact to perform three of the 
seven integrals in equation (18), leaving integrals over J and 



s' (i.e. along the line of sight): 

£1 = Ajd A Jf(J) Jd A edMGl(u a ,u',cr a )F(M)S{u) 
= V d3j / dS '' 



d(b,i,s>) 

x Gl(u a ,u',er a )F(m,s')S(u), (25) 

where u'(8, J, m) and is now a function of s' because (b, I) 
are known. 

The principle of Monte-Carlo integration is now invoked 
to convert these two integrals into sums over points J*, that 
have been selected with a sampling density /s(J) that will 
be described below. Then we have 



fs(J k )' 



(26) 



where we have used the notation <j)k = 0(Jfc), and introduced 
the line-of-sight integral for a given and observation a 



LOSI fc , c 



/ 



da' 



8(6) 



8{b,l,s>) 



F(m, s')Gl(u a , u', cr a )S(u').(27) 



The computation of these Nk x N a line-of-sight integrals 
dominates the computing budget for these calculations. Note 
that given our choice of selection function (eq. 22), we clearly 

have S(u') = const / in all cases. The Jacobian j g^fl^ | 
can be found using the torus machinery (Binney & McMillan 
2011), and is closely related to the density of the orbit. 

This approach allows us to reuse the values LOSIfc ja 
and (f>k that we have determined for the calculation of £ for 
many dfs in a given potential, but note that both depend 
critically on the relationship between u' and (0,3), which 
depends on the potential, so they cannot be reused when we 
move to a new potential. 



3.3 Evaluating with J(x, v) 

Given that we have expressions for J in terms of (x, v), 
which we find using the Stackel approximation, we can eval- 
uate the normalisation factor A directly from equation (20) 
rather than its angle-action reformulation (23). We use the 
sampling density /s(x, v) described below to ensure the 
points are concentrated where the integrand is largest. Then 
we have to evaluate the Monte-Carlo sum 

V^=4E ^rr^T / dmF(m,r fc )S(x fc ,v fc ,m).(28) 

N £rj tsyx-k, Vfc) J 

Since 1/A is a factor of every star's likelihood £", the total 
likelihood £ oc ^4^^*. Consequently, if we have a fractional 
error of S in the value of A, the error in log£ is ~ 0.43 iV* 6, 
so for our catalogues of 10 000 stars, we would require an un- 
certainty of ~ 0.02 per cent in A to limit the uncertainty in 
log C to order unity. Numerical experiments to be described 
below reveal that with 4 x 10 6 sample points the uncertainty 
in A is ~ 0.4 per cent, ~ 10 9 sample points are required to 
determine the log £ to order unity. This is a very challenging 
requirement! 

Fortunately the absolute value of £ is not important. 
What matters is the ratio of two given values of £. We 
can minimise the numerical noise in this ratio by fixing the 
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points (xfc,Vfc) at which we evaluate the Monte-Carlo sum 
(cq 28) for all models, independently of the potential. 

The great advantage of using J(x, v) rather than x(0, J) 
is that it is now possible to derive a Monte-Carlo sum for 
the likelihood in eq. (18) that runs over observable points 
u' that are determined by the observations and, again, are 
independent of the potential. Moreover we will be able to 
arrange that each star's sample points will be concentrated 
within its error ellipsoid, rather than distributed regardless 
of where the star is observed. 

We start by writing the integral associated with the ath 
star as 



d{0,J,M) 



I d(u>) 
x Gl(u a ,u',<T a )f(J)F(M)S(u') 



were the Jacobian is simply 



d(0' , J', M') 



a(u') 



6 cos 6 

= S COS = s- . 



(29) 



(30) 



Then for each star we choose a sampling density £(u'|u a ) 
that causes the sample points to be concentrated in the re- 
gion of u' space that dominates the integral. In the case of 
small errors, this region is the inner ~ 3a of the error ellip- 
soid. In the case of serious errors - for example if a variable 
such as «|| has not been measured or the measured value 
of zu is negative - this region is a subset of the error ellip- 
soid that is consistent with reasonable expectations of how 
stars are distributed in phase space. Again assuming that 
the measured values of l a ,b a and m a are exact, we are led 
to the sampling density 



£(u'|u Q ) = C 



for il,b)^ (l a ,b a ) 
8(0', 3', M' 



d(u') 

x/ s (x',v')F(m Q , S ) 



Gj(u Q ,u',eO (31) 
otherwise, 



where C a is the normalising constant. With this sampling 
density the integral (eq. 29) reduces to the Monte-Carlo sum 



C a = 



A 



E 



(32) 



where (x' fe ,v' fc ) is determined from u' k , and J J. = J(x' k ,v' k ). 
We note that C a is independent of /(J) and so it does 
not vary as we explore different dfs and potentials. Since 
we are only ever interested in the ratio between likelihoods 
calculated for different dfs and potentials, we do not need 
to compute C a . 

3.4 Choice of sampling density 

A good choice of the sampling density /s(x, v) used in these 
calculations reduces numerical noise by making the individ- 
ual contributions to the Monte-Carlo sums as nearly equal 
as possible. We achieve this goal by choosing /s(x, v) to be a 
good guess at the phase-space distribution of the population 
that our catalogue samples. It is important to ensure that 
there are no points with fs <SC /, as these will then domi- 
nate the integrals (eqs. 28 & 32), making them very noisy. 
We have based these on a product of a double-exponential 
density in real space with a triaxial Gaussian velocity dis- 
tribution. The principal axes of the velocity distribution are 



aligned with the R, z and <f> directions, with dispersions that 
vary in proportion to exp(— R/8 kpc). The means of the vr 
and v z components are zero, while the mean of v$ is 

(v^) =v c - Va(-R) (33) 

with constant v c = 245 kms -1 , and asymmetric drift veloc- 
ity v a (R) oc a 2 . 

We use the sum of two such discs (approximating the 
thin and thick discs) as fs- We set the dispersions at the 
Solar radius <7r,© = a Zl @ = SOkms" 1 , ct^q = 40kms~ 1 , 
and v a ,0 = 15kms _1 for a disc with scaleheight 0.3 kpc; 
and <7r,q — o" Zj0 = 50 kms -1 , ct^q = 60kms -1 , and v a ,Q = 
30kms _1 for a disc with scaleheight 1 kpc. 30 per cent of fs 
is associated with the thick disc component, and the rest 
with the thin disc. 



4 THE PROBLEM WITH ORBIT-BASED 
METHODS 

To understand why orbit-based methods are extremely ill- 
suited to determining the likelihood of a stellar catalogue, 
we need to look at the numerator of equation (26). The 
main calculation for each star a is finding the line-of-sight 
integral LOSIfe, a for each orbit used in the Monte-Carlo in- 
tegration. These are then summed (with some weights) to 
find the probability that this star is predicted by the model. 
LOSIfc jQ1 is essentially the integral down the line-of-sight of 
the probability of a given orbit giving the observed value of 
u. With increasingly precise data, the number of orbits for 
which this probability is non-negligible anywhere along the 
line of sight diminishes. 

The computation of a star's likelihood proceeds by con- 
sidering each point along the line of sight to the star, and 
then for each orbit finding the velocities that a star on that 
orbit will have at that point. Each such velocity then con- 
tributes to the probability a factor proportional to 



exp 



(V 



2a 2 



2a 2 



(34) 



As we proceed along the line of sight, the observables (uy , /x) 
predicted by the orbit gradually change, so the overall con- 
tribution of the orbit to the likelihood comes from a line 
drawn through the error ellipsoid. The contribution is large 
or small depending on how close the line comes to the centre 
of the ellipsoid. 

When we change potential, our orbits necessarily change 
and the lines through the error ellipsoid change. It may hap- 
pen that our new orbit library has an orbit that yields a line 
that comes very close the centre of the ellipsoid, whereas the 
old orbit library jumped from an orbit that gave a line pass- 
ing closest to the centre say 2a on one side to an orbit that 
passes closest 2a on the other side of the ellipsoid's centre. 
This old library provided no orbit that makes the given star 
very probable even though, with a denser sampling of action 
space, such an orbit would have arisen. Hence with the old 
library rather than the new, the star is declared improbable 
even though it is in fact probable. 

A subsidiary issue is that with the new library some of 
the values J k which had 4>(J k ) = in the original potential 
(i.e. orbits that do not enter the survey volume, and were 
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thus irrelevant to our calculation), have 4>{3k) ^ in the 
new library and have to be considered, and vice versa. 

An indication of how significant these discreteness ef- 
fects are is that changing the potential from one with scale- 
height 300 pc to one with scaleheight 350 pc causes the cal- 
culated values LOSIfc,c« to change by on average a factor 
~ 10. 



4.1 Extent of the problem 

To see the impact of this problem, we now turn to our three 
catalogues described in Section 2.4. Numerical experiments 
show that keeping the same values of J fe as we change poten- 
tials is little better than taking an entirely new Monte-Carlo 
sampling of Jk, so we now explore the case in which we keep 
the same Mcll potential, <&, and the same catalogue of stars, 
but use a new sample of actions Jfe to calculate the likeli- 
hoods (eq. 26). In each case we use the df / truc that was 
used to create the catalogue as both the sampling density 
and the df being tested in equation (26). 

In principle it would be ideal to take a large number of 
Monte-Carlo samples, each containing Nk values of J, de- 
termine the likelihood in each, and directly determine the 
scatter. However this process is prohibitively expensive com- 
putationally, so we use a quicker alternative. We calculate 
the integrals 4>k and LOSIfc, a (equations 24 & 27 respec- 
tively) for 100 000 values of Jfe. Then we choose at random 
subsets of 12 500 or 25 000 or 50 000 values of Jfe and use 
these values to calculate for each star a the standard devi- 
ation a (log £■") of the resulting values of log£J. 

Fig. 3 shows the distributions of these standard devi- 
ations for each library size and each quality of data. In- 
creasing the quality of the data broadens the distribution 
of a(\og£") and shifts its mean to higher values. Even for 
iVfe = 50 000 all stars have non-negligible values of a(log£°) 
and only in the case of the highest-quality data do the distri- 
butions have a heavy tail to large values. Thus the problem 
is not so much the existence of a few outliers but that our 
Monte-Carlo sums give rise to excessive uncertainty in the 
likelihoods of all stars. 

Note that the largest torus libraries considered are more 
than an order of magnitude larger than the orbit libraries 
used for typical Schwarzschild modelling of external galaxies 
(e.g. van den Bosch et al. 2008), though we do not "dither" 
orbits, which is an approach that can significantly increase 
the effective resolution of Schwarzschild models. 

From Fig. 3 it is evident that a(\ogC") can be beaten 
down by increasing the number Nk of orbits in one's library. 
In Paper I we used the Shannon entropy to measured the 
extent to which an individual observation u° is probed by 
a given library of tori. The entropy is 



(35) 



where is the fraction of the calculated likelihood con- 
tributed by the fcth torus. This is 



Pk 



LOSIfe 



EfeLOSIfe, 



(36) 



Clearly S a = if there is only one contribution, and 
S a = In N if N tori provide equal contributions to the star's 



° O.l 

b 



0.01 



I ' — I ' — 

a ^- 0.2 mas/yr, 

a v - 5 km s -1 , 
a v = 0.2 mas 

ct m = 0.2 mas/yr, 

ct^= 0.2 mas 

a^= 0.2 mas/yr 

« N-'/ 2 




j 



100 



1000 



Figure 4. Torus modelling: standard deviations of a stars' likeli- 
hood, cr(Ct) as a function of the effective number of tori contribut- 
ing to the calculation of each likelihood (iV e fj „ eq. 37). The three 
star catalogues, which differ in the completeness of their data, 
are represented by solid, short-dashed or dotted lines. There are 
three lines for each catalogue in the figure, corresponding to three 
different sizes of torus library, N k = 12500, 25000 and 50000. The 



long-dashed line is cr(C-t) 
N c ff * > 10 quite well. 



: 0.9 N 



-1/2 



, which fits all points with 



probability. We can therefore define an effective number of 
contributing values Jfe, 



K s ^=ex P (S a ), 



(37) 



which is the number of tori with equal -p% that would give 
an entropy S a . Clearly we expect that, for a given library 
size Nk, the typical value of N" s will become smaller as 
the observational data become more precise. 

Figure 4 plots the standard deviation a (log £») against 
the effective number N" s „ derived from the average of the 
entropies S a over a sample of equivalent torus libraries. We 
see that all data points lie close to a universal relation, that is 
independent of library size or data quality. Moreover, this re- 
lation asymptotes to the relation <r(log£*) oc \/^eff,*- This 
result conclusively proves that the scatter in likelihood val- 
ues is generated by Poisson noise, and enables us to predict 
how large N" s „ needs to be to beat the noise down to any 
given level for any data set. 

Figure 5 shows how the errors in individual star like- 
lihoods combine to produce errors in model likelihoods C 
by showing histograms of the offsets A(log£) between log£ 
and its mean over all torus libraries. These distributions are 
roughly Gaussian so their widths are characterised by the 
standard deviation a (log C) of log£. 

Figure 6 shows cr(log£) as a function of torus library 
size iVfe for all three catalogues. The more precise the data, 
the larger cr(log£) is, but in each case cr(log£) declines ap- 
proximately in proportion to \/Nk, but is still ~ 10 even for 
our largest library and our lowest quality data. 
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Figure 3. Torus modelling: histograms of the standard deviations <r(log£») calculated from torus libraries of various sizes (as labelled) 
for each of the 10 000 stars in each of our three catalogues (also as labelled) . 



, - 0.2 mas/yr 

- 5 km/s, 
. = 0.2 mas 



50000 tori 
25000 tori 
12500 tori 



I 1 1 I 1 1 I 1 1 , I 

_ o ^ — 0.2 mas/yr 


i . . 1 . . i . 
50000 tori - 


" ct„ = 0.2 mas 






25000 tori - 




12500 tori - 


, , , — r~~r'is, , i , 





: 0.2 mas/yr 



50000 tori - 

25000 tori - 
12500 tori - 



A(log L) 



A(log L) 



A(log L) 



Figure 5. Torus modelling: histograms of the value of log£ calculated for each of our three catalogues. Each value of log£ is computed 
10 000 times using a different torus library. Distributions are shown for three sizes of torus library: = 12 500,25 000 or 50 000 tori. 
From each value of log£ we have subtracted the mean value of its set. 



4-1.1 Implications for determination of <E> 

The key question is: will this uncertainty on the value 
of C prevent us from making useful inferences about the 
Galactic potential? To answer this question we now use 
tori to determine the likelihoods of our three catalogues 
given various choices of potential. In Figure 7 we show the 
peak values of log£ calculated for models constrained to 
have two pseudo-isothermal discs in potentials with vary- 
ing disc scalelengths (upper row) and scaleheights (lower 
row) found from a Monte-Carlo sum over 100 000 values J a, 
(i.e. 100 000 tori). The true potential has disc parameters 
(J?d,2d) = (3, 0.3) kpc. We show (approximate) error bars 
on each value - these are found by extrapolating the rela- 
tionship shown in Figure 6 to N k — 100 000. Note, therefore, 
that this is the uncertainty on C due to the imperfect anal- 
ysis, and is not intrinsic to the observational data. 

In each case, the models with Rd = 2.5 can clearly be 
ruled out, but the models with Rd = 3.5 have the same C as 
the true potential to within the error bars. The results when 
varying Zd are even less encouraging, as all the calculated 
values of C agree to within the error bars, and the incorrect 
potentials (z d = 0.25 or 0.35) are sometimes preferred to 
the true potential. Clearly the true difference in log£ is 
significantly smaller than the uncertainty. 



These calculations would require over a week on a single 
desktop cpu for each potential (though they are simple to 
parallelise). The majority of this time is spent calculating 
the line-of-sight integrals LOSIfc, Q . 

4.2 Why could we determine the DF? 

Our aim in this section is to explain clearly why the torus- 
based method of Paper I could determine the DF to good 
precision but fails here when extended to determination of 
$. To this end we consider the toy problem illustrated in 
Figure 8. We use the Monte-Carlo principle to estimate the 
ratio of two integrals Ii of functions r/i(x) of one variable that 
we know to be products of the Gaussian of unit dispersion 
and zero mean and functions fi (x) that vary on scales larger 
than unity. Thus 

h = I r ll (x)Ax = j di,/j(i)xG(i,0,l), (38) 

where for the /; we adopt 
fi(x) = 1 + 0.01 a; 

f 2 (x) = 0.95 + 0.03 a;. (39) 

With these choices the true values are h = 1 and I2 = 
0.95. The Gaussian represents a star's error ellipsoid and 
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Figure 7. Torus modelling: differences in the largest values of log likelihood obtained by varying the DF in a given $ and the value for 
the true <!>. The DF /(J) is assumed to be of the form (2). In the upper row the scalelength of the disc that contributes to $ is varied, 
while in the lower row the its scaleheight is varied. The true values are (R^,z^) = (3,0.3)kpc. The computations use Nf. = 100 000 
tori. The error bars are approximate, and found from an extrapolation to JVfc = 100 000 of the relations shown in Fig. 6, assuming that 
(rlog£oc N~ 1/2 . 
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Figure 6. Torus modelling: the standard deviations of the his- 
tograms plotted in Fig. 5 as functions of the number Nk of tori 
employed. The red long-dashed lines have slope —1/2 so we see 



that the uncertainty in log C declines approximately as N 



1/2 



the fi represent candidate dfs. Figure 8 shows the process 
of Monte-Carlo evaluation of the U. In the top left panel I\ 
is found to be Ii = 1.129 rather than its true value, unity. In 



the top right panel independently sampled points are used 
to estimate that I2 = 1.206, so the ratio of the integrals is 
J2//1 = 1.068 rather than 0.95 as it should be. The lower 
panels show re-evaluations of I2 using points that are not 
independent of those used to evaluate I\ : in the bottom-right 
panel we use exactly the same points to find I2/ 1\ = 0.953, 
within 0.4% of its true value, while in the bottom left panel 
we use points that are shifted by 1.5 to the left, and find 
h/h = 0.860. 



This experiment shows that if we use the same sam- 
pling points we can determine the ratio of two integrals like 
those we have to evaluate to obtain much more accu- 
rately than we can determine either integral individually 
because the Poisson noise in the evaluation largely cancels 
from the ratio. If we use different sampling points to eval- 
uate each integral, the Poisson noise does not cancel and 
the ratio is even less accurate than the individual integrals. 
In Paper I we used one set of sampling points to evaluate 
the likelihood of every DF, so the Poisson noise made little 
contribution to the differences of the log likelihoods of the 
dfs considered, and we could identify the true DF accurately. 
In Section 4.1 we were obliged to vary the sampling points 
between potentials and the Poisson noise in the differences 
of log likelihoods degraded performance to an unacceptable 
extent. 
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Figure 8. Toy example analogous to comparing the likelihoods 
of given data in different potentials. In each case we determine an 
integral Ii (eq. 38) by Monte-Carlo summation over 20 points 
taken randomly from the range —5 < x < 5. In the top-left 
panel we show the function values r?i used to find I\ (in the other 
panel these values are shown as dots, for comparison). The sum 
yields I\ = 1.129 rather than unity. In the other three panels we 
show the summed values of r)2 when we have either completely 
resampled the values of x (top right), or have shifted all the values 
by A x = — 1.5 (with any points that fall below x = —5 replaced 
by ones at x > 3.5, bottom left), or have used the same values of 
x as were used for Ii (bottom right). Although these last points 
do not yield a particularly accurate value of I2 they do yield an 
accurate value for the ratio I\/l2- 



5 THE SOLUTION: USE J(x,v) 

In Section 3.3 we explained how, if we have a way to find 
J(x,v), the can be evaluated using samples of points u' k 
that are chosen specifically for each star in the catalogue and 
are never varied. We also explained that a second sample of 
points should be used to evaluate the normalising constant 
A for all potentials considered. We now show that this ap- 
proach dramatically reduces the numerical noise that was so 
prominent in Section 4.1. 

In summary, the scheme is: 

(i) For the calculation of A, sample Na points (x, v) from 
the sampling density /s(x, v) (ignoring any that lie outside 
the survey volume). 

(ii) Sample iV u / points for each star from the sampling 
density £(u'|u a ), (eq. 31). 

(iii) Choose a gravitational potential and determine J 
for each of the Na points used to determine A and each of 
the N a x JV U / points used to then determine C. 

(iv) Maximise C in this potential by varying the param- 
eters of the df /(J). 

(v) Return to step (iii), choosing a new potential. 

This process is orders of magnitude faster than the torus 
approach, so we are able to carry out many more tests. 





vary z d 



stabilised A 



vary z d 



unstabilised A 



Figure 9. Using J(x, v) when the data are error-free: Differences 
between the largest value of log C obtained for a candidate <E> and 
the value obtained for the true $ as we vary either the scale- 
length of the potential-generating disc (upper) or its scaleheight 
(lower). The left panels show results obtained using the fixed sets 
of sampling phase-space points throughout, while the right col- 
umn shows the effect of choosing new points for the estimation of 
A for each trial Note that the range of log£ on the y-axis is 
less than half that in Fig. 7. 



As a proof of principle, we show in the left column of 
Fig. 9 results for the case in which we have perfect observa- 
tional data with the consequence that the Monte-Carlo sum 
for d (eq. 31) requires just one point. These tests are very 
similar to those of Ting et al. (2012), except we have signifi- 
cantly a more complicated (and realistic) selection function 
that requires a more careful Monte-Carlo integration, and 
we do not marginalise over the parameters of our (some- 
what more complicated) df. We use Na = 4 x 10 6 points 
for our normalisation calculation (as opposed to 10 5 used 
by Ting et al: priv. comm.). Note that the equivalent cal- 
culation is essentially impossible to perform correctly with 
an orbit library, as the probability of an orbit in the library 
passing precisely through the observed phase-space location 
of a star is zero. 

The top left panel of Fig. 9 shows the effect of systemat- 
ically varying the scalelength Rd of the disc that contributes 
to $ around its true value Rd = 3kpc, while the lower left 
panel shows the effect of systematically varying the poten- 
tial's scaleheight around its true value 2<j = 0.3 kpc. The 
data points now reveal both Rd and Zd to good precision. 

The right panels of Fig. 7 show the importance of pre- 
venting the Poisson noise in our estimate of A from scat- 
tering the data points by showing the points one obtains 
when new sampling points (x, v) are chosen for each trial 
$. The noise has a totally devastating impact on our ability 
to deduce z d - 

When we resample for each potential, we can determine 
the uncertainty in log likelihood simply by repeating the ex- 
periment several times for the same data set and determin- 
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Table 3. Uncertainties in zj. The intrinsic uncertainty is the 
uncertainty due to the finite size and observational accuracy of 
the catalogue. We find this value by fitting a Gaussian in to C. 
The numerical uncertainty is the uncertainty introduced by the 
limited numerical precision of the integrals used to find C. 



Data 


Best fit 


Intrinsic 


Numerical 




2d 


uncertainty 


uncertainty 


Exact 


0.292 


0.019 


0.002 


fj,, f || & VJ 


0.294 


0.021 


0.006 


fl & w 


0.272 


0.032 


0.012 




0.313 


0.036 


0.017 



ing the standard deviation of the recovered values. When we 
do not resample, this approach is inadequate - the expected 
improvement is in the accuracy of the relative log likelihoods 
found. We therefore determine error bars by fixing the log 
likelihood found for the (known) true potential as zero in 
each experiment and finding the scatter in the relative value 
found in each given potential. 

Figure 10 shows results obtained when we use J(x, v) 
and fixed phase-space sampling points to analyse our three 
catalogues of varying completeness. The improvement over 
the results shown in Fig. 7 is dramatic - note that the scale 
in log£ for plots with varying z d is an order of magnitude 
larger in Figure 10 than in Figure 7. The uncertainty in 
the difference between the log£ values of potentials that 
differ by 25 pc in their values of Zd is now as small as ~ 0.1. 
With this level of uncertainty in log C differences, it becomes 
possible to constrain the value of Zd strongly. Extrapolating 
the trend shown in Figure 6 suggests that achieving the same 
precision with tori would require the use of ~ 10 9 to 10 10 
tori for the three catalogues we consider. 

We can quantify both the intrinsic uncertainty in <3> as- 
sociated with our catalogues and the additional uncertainty 
produced by the Monte-Carlo integration. For the varying 
scaleheights (Fig. 10 lower panels) we can fit the values of 
log£ to a quadratic in z d (i.e. approximate C as Gaussian in 
Zd). We can then read off the most likely Zd, and its uncer- 
tainty g 2 d , which is very close to the intrinsic uncertainty. 
If we do this many times (for many different Monte-Carlo 
sums) we can compare the most likely Zd found in each case 
and find the scatter in these values, which is the uncertainty 
associated with the Monte-Carlo integration. 

Table 3 gives these uncertainties. With only 10 000 stars 
we can determine Zd with intrinsic uncertainty of less than 
36 pc with only proper motion data, or less than 20 pc when 
line-of-sight velocities and parallaxes are also available. In 
each case the uncertainty introduced by the Monte-Carlo 
sums is significant smaller than the intrinsic uncertainty. 
Since the volumes of the error ellipsoids increases as the 
completeness decreases, to achieve a given precision more 
points are required in the Monte-Carlo sums (larger N u >) 
when the data are incomplete than when they are complete. 

For varying scalelengths, it's clear that in the range 
analysed, C is not well approximated by a Gaussian in Rd- 
The range in log£ found is much larger than in the scale- 
height case. With this analysis we can therefore only reason- 
able say that the uncertainty in Rd is significantly smaller 
than 250 pc. 



Determining one of these likelihoods requires the calcu- 
lation of ~ 10 7 values of the actions in a given potential, 
a process which takes ~ 10 minutes on an ordinary desk- 
top cpu, and can easily be parallelised. This is ~ 10 4 times 
faster than the calculations using tori of Section 4.1. In fact, 
to achieve with tori the same precision we have achieved us- 
ing J(x,v) would demand ~ 10 s more cpu cycles than were 
used for this section! 



6 DISCUSSION 

There are three different kinds of uncertainty associated 
with this work 

• Irreducible statistical uncertainty. This is the uncer- 
tainty associated with the limited number of stars in the 
catalogue and their non-zero measurement errors. 

• Numerical noise associated with the limited accuracy 
with which we evaluate integrals over the df. 

• Systematic errors associated with (i) inaccuracies in the 
transformations between angle-action and ordinary phase- 
space coordinates, and (ii) the use of particular functional 
forms for the dfs and potentials that we fit to the data. 

We have found that the key to reducing the numerical 
noise to the point where it is possible to successfully de- 
termine the Galactic potential from a star catalogue is (i) 
evaluation of the df at points that are fixed in the space 
of observables u, and (ii) clustering these points within the 
error ellipsoid of each observed star, so we are sure to evalu- 
ate the df throughout the region of phase space where each 
star might lie. 

Given that the initial conditions of an orbit can be con- 
sidered its integrals of motion, it might be argued that our 
prescription is readily implemented within the context of 
Schwarzschild modelling: we build our orbit library by inte- 
grating orbits from initial conditions that strategically cover 
each star's error ellipsoid. 

One way to see the fatal weakness of this idea is reductio 
ad absurdum: we consider the limiting case of perfect data. 
Then only one orbit will be required to cover each star's error 
ellipsoid, and when we assign unit weight to each orbit, we 
will obtain perfect agreement with the data regardless of 
what potential we choose because the orbit started for one 
star has zero probability of being sampled at the location 
of another star. The potential can be constrained only to 
the extent that each orbit contributes non-trivially to the 
likelihood of more than one star. 

Chaname et al. (2008) achieve this goal by explicit bin- 
ning of the model on the sky. This binning operation is es- 
sentially the means by which the density in the space of ob- 
servables is constructed out of the otherwise uninformative 
orbital weights. The usefulness of binning decreases rapidly 
as the dimension d of the space or the data increases. We are 
currently considering the case d — 6, but once we include 
crucial spectral information in our models, d rises to d — 10 
and beyond (e.g. Binney 2011). 

An alternative approach to the problem of determining 
the Galactic potential is to use Jeans' equations (e.g. Bin- 
ney & Tremaine 2008, §4.8) to relate gradients in the density 
and velocity dispersion of a suitable tracer population to the 
gradient in the potential. Recent studies using this method 
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Figure 10. Results obtained with J(x, v) with data of varying completeness. For each of our three catalogues studied in Fig. 7 we 
plot the largest value of log£ minus that obtained for the true potential as either the scale length of the disc that contributes to 3> 
is systematically varied (upper row) or the disc's scalehcight z<j is systematically varied (lower row, note that the range of log C on 
the y-axis is an order of magnitude smaller than in the equivalent plots in Figs 9 & 7). The number of points used for each star was 
N u r = 1000 for the catalogue with measured /i, and vj and N u i = 2000 points for the two catalogues with less complete data. 



include Garbari et al. (2012) and Bovy & Tremaine (2012) - 
though it should be noted in the latter case that the velocity 
dispersions assumed were biased by up to a factor of 2 and 
had materially under-estimated errors (Sanders 2012b), so 
the quoted results will also be biased and offer spurious pre- 
cision. Since this approach relies on the gradient in density, 
it is particularly susceptible to errors in the density profile, 
which become more likely for survey data with complicated 
selection effects, as the selection criteria are typically mag- 
nitude and colour, and vary with position on the sky. 

Our ability to diagnose <3> depends crucially on compo- 
nents of our df contributing to the likelihood of more than 
one star (e.g. Magorrian 2013). When J(x,v) is available, 
sampling the error ellipsoids of stars works because our df 
/(J) is conjectured from the outset rather than estimated 
by binning products of weights and orbital probabilities. Be- 
cause we require / to be a smooth function of J, a change in 
the value of / at the actions of one star changes the value of 
/ at the actions of many other stars in a way that depends 
on <3>. It is this principle that provides diagnostic power. 

Our choice of parametrised form for /(J) is therefore 
crucial. An excessively flexible form will simply fit the noise 
in the data. A badly chosen or insufficiently flexible form will 
produce biased results. For example, if we perform the tests 
with varying Zd as in Section 5, except that we only allow 
/ to consist of a single quasi-isothermal disc (as opposed to 
the two discs it actually comprises), we are strongly biased 



towards low values of Zd- dfs of the type used here have 
been shown to provide good fits to observational data in the 
Solar neighbourhood Binney (2010, 2012b), but it is clear 
that one must be careful not to over-constrain them at the 
expense of biasing estimates of $. 

As Magorrian (2006, 2013) has stressed, <£> should re- 
ally be found by marginalising over the df rather than by 
finding the pair (/, $) that maximises the likelihood of the 
data. In statistical problems we often take the shortcut of 
seeking the most likely value of some variable rather than 
the variable's expectation value, but the justification for this 
step has to be that the probability distribution is so sharply 
peaked around the most likely value that these two values 
are effectively indistinguishable. The classic example of how 
misleading this assumption can be, is provided by the ther- 
mal equilibrium of a macroscopic object, such as a diamond 
of N atoms. Since the probability that any of the diamond's 
normal modes is in its ith excited state of energy Ei is pro- 
portional to the Boltzmann factor exp(— Ei/ksT), the di- 
amond's ground state, in which all normal modes are un- 
excited, is by far the most probable state regardless of the 
temperature T. Yet a real diamond has negligible probabil- 
ity of being in its ground state: it is certain to be in a state 
that is higher in energy by ~ 3NksT. The actual state is ex- 
traordinarily improbable, but there are so many states like 
it, that we can be certain the diamond is in one of them and 
not its enormously more probable ground state. 
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Thus it is dangerous to suppose, as we have done, that 
the Galaxy's potential is the member of the pair (/, $) that 
has the highest probability: this may be a singular pair and 
nearly all the probability is associated with materially dif- 
ferent pairs (/',$'), so these pairs would dominate the ex- 
pectation of $ if we marginalised over the df. The key to 
this marginalisation is knowing how to sample the space of 
all possible dfs. Magorrian (2013) explains how this should 
be done, but we do not yet know whether doing so materi- 
ally changes our conclusion regarding the form of $. Ting 
et al. (2012) marginalised over the parameters of their sin- 
gle pseudo-isothermal df, and found that this made little 
improvement to their results as compared to simply find- 
ing a maximum likelihood. This is, however, still tied to the 
parametrised form of the pseudo-isothermal df, and there- 
fore does not really answer the question. 

It is encouraging that the formulae for J(x,v) intro- 
duced by Binney (2012a) are accurate enough to perform 
the analysis in Section 5 without biasing the results on 
the investigated scales. However, they are neither as gen- 
eral nor as accurate as the principle of torus construction - 
the latter is a systematic approximation scheme whose ac- 
curacy can be ramped up at will. Binney's formulae are by 
contrast fixed: their accuracy cannot be systematically in- 
creased. They were introduced and validated in the context 
of the orbits of disc stars in the solar neighbourhood, and it 
not entirely clear why they work as well as they do for these 
orbits. Work needs to be done to optimise the extension of 
these formulae to the orbits of bulge and halo stars. Sadly, 
there is scant prospect that these formulae can be extended 
to the rotating non-axisymmetric potential of the Galactic 
bar, so the conclusion that our ability to diagnose $ hangs 
by the slender thread of these formulae is a worrisome one. 

Fortunately, values J(x,v) can be obtained from tori: 
given a trial potential and a point (x, v) we estimate (0, J), 
perhaps from Binney's formulae, and construct a trial torus. 
Then as described in McMillan & Binney (2008) we itera- 
tively adjust J until we obtain a torus that passes through 
the given phase-space point. This procedure will be more 
costly than that used in Section 4.1 by a factor of a few be- 
cause several tori will have to evaluated for each sampling 
point u', but the procedure will yield the same precision as 
was achieved in Section 5. 



7 CONCLUSIONS 

A fundamental task of Galactic astronomy is determination 
of the Galaxy's gravitational potential $ because a knowl- 
edge of $ is required for any investigation of the dynamics 
or evolution of the Galaxy. In Paper I we showed that mod- 
els constructed from orbital tori can be used to constrain 
the Galaxy's df to good precision from a catalogue that 
contains only ~ 10 000 stars. In Section 4 we extended this 
approach to the determination of Although the extension 
is straightforward, we found that it is in practice a notable 
failure. We traced the problem to Poisson noise arising from 
the use of a finite number of tori in the analysis. The noise 
level increases with the completeness and precision of the 
data because the number of tori that contribute significantly 
to the likelihood of a given star decreases with the volume of 
the star's error ellipsoid. We showed that to beat this noise 



down to an acceptable level by brute sj N growth one would 
have to use a number of tori that exceeded the number of 
stars we were considering (10 000) by at least four orders of 
magnitude. 

Torus modelling is an extension of Schwarzschild mod- 
elling, so any problem inherent in torus modelling will be 
shared by Schwarzschild modelling - for a detailed compari- 
son of the two techniques see Binney & McMillan (2011). 
Made-to-measure modelling (M2M: Bissantz et al. 2004; 
Dehnen 2009; Morganti & Gerhard 2012) is a modification 
of Schwarzschild modelling in which one does not hold entire 
orbits in memory, and as such will suffer badly from discrete- 
ness noise when used with data that are complete and/or 
precise. Straight N-body modelling has the same problems 
with discreteness noise that M2M modelling has, and in ad- 
dition extreme difficulty in adapting the model to fit the 
data. Thus the discreteness noise we exhibited in Section 4 
is a major issue for all galaxy-modelling strategies that are 
based on orbits. 

In Section 5 we showed that discreteness noise can be 
mastered if we evaluate actions as functions of (x, v) rather 
than the other way round. This is very similar to the ap- 
proach used by Ting et al. (2012), though our consideration 
of more realistic selection effects and non-negligible observa- 
tional uncertainty forces us to deal more carefully with the 
discreteness noise in this case as well. This approach works 
because we only require ratios of likelihoods, and the dis- 
creteness noise will cancel from these ratios if we evaluate 
both likelihoods using the same phase-space points (x, v) for 
the Monte-Carlo sums with which we approximate integrals. 
With this approach we were able to achieve the numerical 
precision to determine the scaleheight of the potential al- 
most as accurately as the data allows, which is to within 20 
to 30 pc for the catalogues of 10 000 stars that we consider. 

Here we failed in our attempt to constrain the potential 
with tori but this failure does not indicate that tori should 
be abandoned. Right now they are invaluable for generat- 
ing models, and in the future they will play a key role in 
forthcoming models of the chemodynamical evolution of the 
Galaxy - recall that our analysis here does not recognise 
chemically distinct populations, which are in reality central 
to studies of the structure and history of our Galaxy. More- 
over, it is possible to upgrade our torus machine so it can 
evaluate J(x, v). The brute-force way to do this is simply 
to construct tori iteratively until one has constructed the 
one that passes through a given phase-space point (x,v) 
(McMillan & Binney 2008), but a computationally much 
faster technique may be possible: currently we exploit each 
torus in isolation and given observables u for a star, we find 
the phases 9 at which a star on a given torus J passes close 
to u. By interpolating between tori it should be possible to 
find the (8, J) combination that brings a star to u. 
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APPENDIX 

In Table 4 we give the parameters of the gravitational po- 
tential models with varying f?d used for the tests shown in 
Figures 7, 9 & 10. In each case the the potential is referred 
to in the text by its thin disc scalelength. 
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2d, 0, thin 


^d.thin 


^d, thin 


^d,0, thick 


Rd,thick 


2 d. thick 


Pb,0 


Ph,0 


Hi 


(M pc- 2 ) 


(kpc) 


(kpc) 


(M pc- 2 ) 


(kpc) 


(kpc) 


(M pc" 3 ) 


(M© pc- 3 ) 


(kpc) 


740.3 


2.25 


0.30 


161.7 


2.62 


0.90 


86.1 


0.0271 


12.5 


704.1 


2.50 


0.30 


163.4 


2.92 


0.90 


86.5 


0.0214 


13.8 


851.3 


2.75 


0.30 


199.8 


3.21 


0.90 


92.4 


0.0127 


17.0 


753.0 


3.00 


0.30 


182.0 


3.50 


0.90 


94.1 


0.0125 


17.0 


673.8 


3.25 


0.30 


165.0 


3.79 


0.90 


98.7 


0.0232 


11.7 


505.8 


3.50 


0.30 


128.5 


4.08 


0.90 


99.7 


0.0322 


10.4 


410.5 


3.75 


0.30 


103.7 


4.38 


0.90 


99.4 


0.0613 


7.5 



Table 4. Parameters of the potentials with varying used in this paper (including the Mcll potential, with i?<j thin = 3.0 kpc). In 
each case the parameters were fit in the same way as Mcll, as described in McMillan (2011) with the assumed Solar radius Ro = 8.5 kpc 
and with the ratio of the thick and thin disc scalelengths fixed at 3.5/3. 
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